Add protobuf support for lambdas#22362
Conversation
| return Err(Error::General( | ||
| "Proto serialization error: Lambda not implemented".to_string(), | ||
| )); | ||
| Expr::HigherOrderFunction(HigherOrderFunction { func, args }) => { |
There was a problem hiding this comment.
super nit: instead of appending this, can we move this next to the rest of functions? (Scalar, Aggregate etc)
There was a problem hiding this comment.
Much easier to cross check the impls now 5688f3f thanks
(also did this to others similar matchs)
| HigherOrderUDFExprNode higher_order_udf_expr = 37; | ||
| Lambda lambda = 38; | ||
| LambdaVariable lambda_variable = 39; |
There was a problem hiding this comment.
super nit as well: Can we move this next to the other UDFs?
There was a problem hiding this comment.
Isn't better being ordered by field id? so the next one who add a node can spot the next available field id right away
| expr_id, | ||
| expr_type: Some(protobuf::physical_expr_node::ExprType::LambdaVariable( | ||
| PhysicalLambdaVariableExprNode { | ||
| index: var.index() as u32, |
There was a problem hiding this comment.
for now we don't use this so it is always 0 no?
There was a problem hiding this comment.
This is used now after lambda capture got merged. The variable below has index 1, for example:
| protobuf::PhysicalHigherOrderUdfNode { | ||
| name: expr.name().to_string(), | ||
| args: serialize_physical_exprs(expr.args(), codec, proto_converter)?, | ||
| fun_definition: (!buf.is_empty()).then_some(buf), | ||
| }, |
There was a problem hiding this comment.
Shouldn't we encode the return_field instead of resolving it from the schema when decoding (like ScalarUDF)?
Not really sure what would be the exact disadvantage of keeping it at is, but wondering if it would be cleaner to make the return type explicit.
There was a problem hiding this comment.
I removed the HigherOrderFunctionExpr::new, the only constructor that receives return_field as arg, from the lambda PR, due to possible type mismatch, so it could be discussed on it's own PR when needed. See #21679 (review) and
FYI, the return field might not match the function output after the new arguments, but you don't have the schema here so you cant check that and I see ScalarFunctionExpr have the same problem
from https://github.com/apache/datafusion/pull/21679/changes#r3109735988 (quoting directly since the github link anchor doesn't work well)
| let serialize_name = extract_function_name(&expr); | ||
| let deserialize_name = extract_function_name(&deserialize); | ||
|
|
||
| assert_eq!(serialize_name, deserialize_name); |
There was a problem hiding this comment.
should we assert directly on expr and deserialize instead? since the Hofs implement PartialEq I think it should work
assert_eq!(expr, deserialized_expr);
There was a problem hiding this comment.
Yes, 9a3f30c thanks
This was based on test_expression_serialization_roundtrip which checks only names but asserting directly on expr is definitively better
Which issue does this PR close?
Part of #21172
Rationale for this change
Protobuf support wasn't implemented in main lambda PR to not make it even bigger
What changes are included in this PR?
Protobuf encoding and decoding (~1000 LOC in generated files, ~210 impl, ~400 tests)
Are these changes tested?
Unit tests, similar to the existing ones for scalar functions
Are there any user-facing changes?
None